Corpus-Based Rules for Czech Verb Discontinuous Constituents
نویسندگان
چکیده
In this paper we present a method for extracting general structures of the verb groups from a tagged and fully disambiguated corpus and consecutive exploitation of these structures for the building a formal grammar in the Prolog DCG fashion. Our goal is to apply them as a rules for the analysis of the Czech verb groups in the nondisambiguated grammatically tagged Czech corpus texts. The problem of the recognition of verb discontinuous constituents in Czech is also approached and obtained statistical data are presented.
منابع مشابه
Recognition and Tagging of Compound Verb Groups in Czech
In Czech corpora compound verb groups are usually tagged in word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group lose their original meaning. We present a method for automatic recognition of compound verb groups in Czech. From an annotated corpus 126 definite clause grammar rules were constructed. These rules describe all compound verb...
متن کاملApplying Licenser Rules to a Grammar with Continuous Constituents
Licenser rules have originally been introduced in Müller (1999) as a part of a grammar based on discontinuous constituents. We propose licenser rules as a means to avoid underspecified empty elements in grammars with continuous constituents. We applied them to a verb movement analysis of the German main clause with right sentence bracket and to complement extraposition. To reduce the number of ...
متن کاملNon-projectivity and valency
We describe results of investigation of a specific type of discontinuous constructions, namely non-projective constructions concerning verbs and their arguments. This topic is especially important for languages with a relatively free word order, such as Czech, which is the language we have primarily worked with. For comparison, we have included some results for English. The corpora used for bot...
متن کاملContinuous or Discontinuous Constituents ?
During the last years, several grammarians have argued for linguistic descriptions of language that use the con-has shown that in the worst case 2 n constituents can be built for an input string of length n if discontinuous constituents are allowed. As Carroll (1994) has demonstrated, such theoretical values are not of much help when it comes to practical systems. In the following I will compar...
متن کاملParsing String Generating Hypergraph Grammars
A string generating hypergraph grammar is a hyperedge replacement grammar where the resulting language consists of string graphs i.e. hypergraphs modeling strings. With the help of these grammars, string languages like anbncn can be modeled that can not be generated by context-free grammars for strings. They are well suited to model discontinuous constituents in natural languages, i.e. constitu...
متن کامل